THE STAR*CORE SC140 DSP CORE 
Background Information 

OVERVIEW 

For a digital signal processor to set new standards in performance, or power 
consumption, or die size, or code density, or ease of development, is exceptional. But to 
raise the bar in all of these areas simultaneously is practically unheard of. Yet that is what 
the Star*Core'^’^ SC 140 DSP core achieves. Designed to power next-generation system- 
on-a-chip designs for advanced communication applications, the SC 140 core is a best-of- 
all-worlds proposition. With its four multiply-accumulate (MAC) units, the 16-bit core sets 
new standards for high-end DSP performance, exceeding one billion MACs per second. 

At the same time, the SC 140 is extraordinarily efficient in its use of power, silicon and 
program code. 

To top it off, the SC 140 core has a remarkably compiler-friendly design (unlike most 
DSPs), and it is supported by a set of outstanding development tools, giving customers the 
ability to create advanced DSP applications more rapidly and cost-effectively. The result is 
a DSP core that exemplifies the phrase “efficient compilability” as no previous DSP core 
has. In other words, the SC 140 enables an unprecedented degree of high-level language 
programming while at the same time delivering best-in-class code density, performance and 
power consumption. 

The result is a uniquely versatile DSP engine. The SC 140 has the sheer horsepower 
required by the most demanding multichannel communication applications, such as 
infrastructure equipment in high-speed networks. By the same token, it meets the size, 
power and system cost requirements of subscriber devices such as next-generation digital 
cellular phones (while delivering much higher performance than previous subscriber-class 
DSPs). No DSP core has ever covered so many bases. 
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An Unprecedented Alliance _ 

The SC 140 core is the first implementation of the Star*Core SC 100 architecture. Both 
the SC 140 core and SC 100 architecture are the products of a uniquely synergistic 
collaboration between two DSP leaders, Lucent Technologies and Motorola. The 
Star*Core Alliance, as the partnership is known, was formed in 1998 to develop new, 
superior DSP core architectures and development tools for future communications, 
transportation and consumer electronics applications. The effort brings unprecedented 
resources to bear in the creation of an industry-standard DSP architecture. The Star*Core 
Alliance pools some of the industry’s most experienced and expert DSP engineers, and it 
provides the critical mass needed to attract broad third-party support, giving customers 
more choices in areas such as development tools, operating systems and application 
software. 

The Star*Core Alliance will concentrate its efforts on the fundamental architecture and 
core implementations thereof, along with associated development tools. The two 
Star*Core partners will separately develop their own system-on-a-chip products by 
combining Star*Core cores with a variety of on-chip peripherals, such as coprocessors and 
accelerators, memories, input/output interfaces and system integration modules. The first 
Star*Core-based chip products are expected to debut in the year 2000, with baseline 
development boards planned for the fourth quarter of 1999. 
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STAR*CORE SC140 BENEFITS 


Efficient Compilability _ 

The SC 140 was created with extensive involvement of compiler experts, who helped 
design a DSP core that is among the most compilable in the industry (even while excelling 
at performance, power consumption and code density). Indeed, the SC 140 is the first 
compilable DSP core for wireless subscriber applications. System designers who in the 
past had to make do with assembly language programming can now take advantage of 
modern compiler-based C or C-t-i- coding. Not only can a much higher percentage of 
programming be done in high-level languages than with many previous DSP cores, but the 
resulting compiled code has outstanding performance (comparable to assembly code 
running on other DSPs) and excellent code density. 

Among the features that make the core so well-suited to compilation are its orthogonal 
instruction set (i.e., its non-restrictive use of registers), its assortment of three-operand 
instructions and its support for both integer and fractional data types. 

Unmatched Performance _ 

The SC 140 boasts the highest performance of any DSP core to date—owing largely to 
the extraordinary parallelism provided by the core’s twelve data execution units, consisting 
of four general arithmetic logic units (ALUs), four bit field units (BFUs) and, most 
notably, four multiply-accumulate (MAC) units. This is twice the number of MAC units 
featured on any previous DSP core, furnishing two or more times the performance in such 
essential DSP tasks as finite impulse response and infinite impulse response (FIR and HR) 
filters and fast Fourier transforms (FFTs). And compared to DSP cores typically used in 
wireless handsets, the SC 140 provides roughly four times the performance, making the 
SC 140 an ideal enabler for emerging third-generation (3G) wireless technology. 
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At its initial clock speed of 300 megahertz (MHz), the SC 140 can execute as many as 
1.2 billion multiply-accumulate operations per second. It is important to note that while 
some DSPs have claimed high effective MAC rates using SIMD (single instruction, 
multiple data) schemes, this approach can only be used with restrictions and typically can 
not be sustained for any length of time. It inflicts high set-up overhead and requires 
convoluted programming incompatible with compiler use. 

The SC 140 has 16 function units in all. In addition to the twelve data execution units, 
the core contains two address arithmetic units (AAUs), one bit manipulation unit (BMU) 
and one branch unit. Overall, the SC 140 can issue and execute up to six instructions per 
clock—e.g., four MACs and two moves. This equates to ten instructions on some 
competing DSP cores, which use simpler RISC-style instructions. (For example, such 
chips require separate instructions for the multiply and accumulate phases of the MAC, as 
opposed to SC 140, which needs only one instruction for the complete multiply/accumulate 
operation.) For purposes of comparison, the Star*Core can be said to perform 3000 RISC 
MIPS (ten RISC operations per cycle at 300 MHz)—again superior to any previous DSP. 

Not only is the SC140’s peak performance higher, but the core is also better at 
sustaining this high performance over time, owing to the flexibility of its data execution 
units. Up to four of the data execution units can operate simultaneously in any 
combination. For example, the core could execute four multiply-accumulate operations in a 
single clock, or one MAC, two arithmetic/logical operations and one bit field operation. All 
four MACs are identical, as are the four ALUs and the four BFUs. This permits great 
flexibility in the assignment and execution of instructions, increasing the likelihood that 
four execution units can be kept busy on any given cycle and enabling programs to take 
better advantage of the core’s parallel architecture. Other DSPs lack the SC 140’s 
versatility, providing, for instance, a multiplier and an adder-subtractor as opposed to the 
SC140’s general-purpose ALUs. This functional specialization places tight restrictions on 
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the mix of instructions that can be executed simultaneously, reducing effective parallelism 
and sustained performance. 

Low Power _ 

The SC 140 draws only 0.11 milliamperes (mA) per MIPS at 1.5 volts and 0.055 mA at 
0.9v—a far better power-to-performance ratio than any competing DSP core. In fact, the 
SC 140 is the first DSP core in its performance class that is energy-efficient enough to be 
used in low-power portable devices such as digital cellular phones, where battery life is 
crucial. The core’s low power consumption also allows greater on-chip integration, since 
more resources can be packed onto a die without exceeding the thermal limitations of low- 
cost packaging. 

The SC 140’s power efficiency results from such factors as low-power circuit design, 
power-conserving standby modes and the ability to power up or power down each of the 
core’s function units individually on a clock-by-clock basis. Only those units in use at any 
given time cycle receive a clock signal; idle units draw no power whatsoever. The core’s 
high parallelism also saves power, since its large number of function units (sixteen total, 
six of which can be used in a given clock cycle) allow more work to be done in a given 
clock cycle without a proportional increase in current. 

Small Die Size, High Code Density, Low Cost _ 

The SC 140 uses less die area than other DSP cores in its performance class, and it 
beats all DSP cores in code compactness, whether compiled or assembled, control code or 
DSP algorithms. The SC140’s code is more than twice as dense as competing high- 
performance DSPs and compares favorably to that of the best microprocessors. By cutting 
down on the quantity of code required, the Star*Core SC 140 will allow squeezing entire 
programs in smaller on-chip memories, further reducing chip and system costs and 
enabling applications that would otherwise be impossible because of memory limitations. 

It also minimizes the bandwidth to memory, thus reducing power dissipation. 
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One key reason for this compactness is that the Star*Core SC 100 architecture uses 16- 
bit instructions, as opposed to the 32-bit instructions used by some other DSPs (which 
drive up code size and system costs). At the same time, the Star*Core architecture achieves 
high parallelism by allowing multiple one- two- or three-word instructions to be issued 
simultaneously, along with prefixes which extend the capabilities of these instructions. 

The SC 140 can issue from one to six 16-bit instructions and from zero to two prefixes per 
clock cycle. The number of instructions and prefixes issued can vary on an instruction-by¬ 
instruction basis, thanks to the SC 140’s variable-length execution set (VLES) model. If, 
for instance, the core is executing some sequential control code in which only one 
instruction can be performed at a time, then only one instruction is issued. That instruction 
is dispatched to the appropriate execution unit, and the rest remain idle. If the core is 
performing some highly parallel DSP algorithm in which six instructions (four MACs and 
two moves, say) can be performed simultaneously, then six instructions are issued. 

The virtue of the VLES model is the way it combines excellent code density (because it 
issues only as many instructions as are useful) and high performance (because it allows a 
large number of instructions to be issued in parallel. 

Scalability _ 

As mentioned earlier, the SC 140 is just the first in a growing family of compatible DSP 
cores designed to meet a wide range of price and performance requirements. This 
unequaled breadth is made possible by the unmatched scalability of the SC 100 architecture. 
Scalability is a major benefit to customers, allowing them to use a single compatible 
architecture across a wide array of products, maximizing their return on investment through 
reuse of software, algorithms, design tools and system-on-a-chip solutions. 

A number of attributes make this scalability possible. Most important is the 
architecture’s ability to accommodate any number of execution units, such as MACs, which 
can number from one to eight or more. The number of addressing units and registers can 
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likewise be varied, and the basic instruction set can be augmented with instruction set 
architecture (ISA) extensions (e.g., for emerging applications such as 3G wireless 
systems). 

Also crucial is the fact that the maximum size of the execution set can be varied in 
accordance with the number of parallel execution units. If, for example, a core were 
designed with eight MACs, the execution set could be widened to a maximum of eight 
instructions. That is possible because the SC 100 architecture does not explicitly encode 
which execution unit each instruction is to be sent to (this decision is left up to the core’s 
program control unit). Some other DSPs, by contrast, explicitly assign instructions to 
particular execution units. This scheme effectively limits the number of functional units 
that can be used at any given time and thus holds the number of instructions that can be 
issued to a number far below what’s possible with the SClOO architecture. 
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STAR*CORE SC140 FEATURES 


The SC 140 is a 16-bit DSP core, available initially at a clock speed of 300 MHz and an 
operating voltage range of 0.9 - 1.5v. Faster clock speeds and lower-voltage versions will 
be developed in the future. 

Functional Units 

At the heart of the core are its twelve data execution units, consisting of four multiply- 
accumulate (MAC) units, four general-purpose arithmetic and logic units (ALUs) and four 
bit field units (BFUs). 

As their name implies, the MAC units perform the multiply-accumulate operation that is 
the foundation of most DSP algorithms. Each MAC unit can execute a 16-by-16-bit 
fractional or integer multiplication and add the result to a 40-bit accumulator in a single 
clock cycle. The ALUs perform general calculations such as adds, subtracts, compares and 
maximum value operations. The BFUs perform bit field functions. Each BEU 
incorporates a 40-bit barrel shifter to speed such operations as multi-bit shifts, bit rotations 
and inserts (especially useful in communications processing). The integration of four such 
barrel shifters on a single DSP core is unique and contributes to the SC 140’s superior 
performance. 

In addition to its data execution units, the SC 140 has two address arithmetic units 
(AAUs), which perform data moves and address calculations; one bit manipulation unit 
(BMU), for bit-level operations; and one branch unit, providing four hardware nested do- 
loops. Any four of the twelve data execution units can be used on any clock cycle, along 
with two of the other functional units, enabling a total of up to six instructions to be 
completed per clock. 

The core’s program control unit includes a program sequencer, which fetches 
instructions and performs loop and branch control. The SC 140 has a five-stage pipeline 
consisting of program pre-fetch, program fetch, dispatch/decode, address generation and 
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execute. This is a relatively short pipeline by DSP standards, and it makes for easier 
assembly language programming and more efficient branch and interrupt handling. 

Registers and Buses 

The core provides sixteen 40-bit general purpose data registers and twenty-seven 32-bit 
address registers (sixteen of them general-purpose). The core’s two 64-bit data buses 
allow up to eight 16-bit data words to be fetched at once, for a total bandwidth of 4.8GB 
per second. The program data bus is 128 bits wide, allowing the core to fetch up to two 
prefixes and six instructions per cycle. 

Support for Peripherals and Accelerators 

The core includes a variety of features which make it easy to interface to on-chip (and 
off-chip) peripherals. A bit mask unit, for instance, enables the core to address every bit in 
any register or in memory, which is useful for such interface-related functions as the 
checking of status bits and the setting of control bits. Also valuable is the core’s support 
for bit- and byte-oriented control operations. Bit-level operations are provided by the 
core’s bit manipulation unit (lacking on many other DSP cores) and four bit field units. 

The SC 140 instruction set supports standard byte operations and byte moves. 

The SC 140 has a built-in test and set capability, useful for synchronization with 
peripherals and algorithm accelerators (on-chip accelerators that are employed, for instance, 
to speed the processing of specific communications protocols). In addition, the instruction 
and data buses have been made accessible outside the core to instruction set architecture 
(ISA) extensions—market-specific function units that add new instructions to accelerate 
particular tasks (e.g., a revision of evolving 3G wireless standards). Because of their tight 
coupling to the core, such ISA extensions are more efficient and deliver a greater 
performance boost than the on-chip function accelerators employed by traditional DSPs 
(which lack support for ISA extensions ). 
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On-Chip Debug Facilities 

The SC 140 has sophisticated debug support built in, thanks to its Enhanced OnCE^M 
(on-chip emulation) controller. The core also benefits from its external trace bus, giving 
the Star*Core partners (Motorola and Eucent) greater flexibility in the design of trace 
buffers and thus enabling them to create SC140-based chips matched to the needs of 
particular markets and customers. 


10 



APPLICATIONS 


The Star*Core SC 140 provides a unique combination of advantages for advanced 
communications system. Its unmatched performance delivers the kind of processing power 
for emerging high-speed telecommunications and networking standards such as 
asynchronous transfer mode (ATM), digital subscriber line (DSL) and gigabit Ethernet. 

And unlike existing high-end DSP cores, which are large, costly and power-hungry, the 
SC 140 uses remarkably little silicon area, has very compact code requirements and 
consumes surprisingly little power. 

From an application point of view, the SC 140 bridges two worlds. It provides the 
speed needed for high-end multichannel processing in network, wireless and telephony 
infrastructure equipment such as remote access servers, wireless base stations and DSL 
telephone switches. At the same time, the SC 140 has the low power dissipation as well as 
the economy and compactness (thanks to its best-in-class die utilization and code density) 
required for low-power terminal equipment such as digital cellular phones; wireless 
modems for notebook computers, personal digital assistants (PDAs) and portable Web¬ 
browsing devices; and DSL modems. The core not only meets the efficiency requirements 
of these subscriber devices, it greatly outstrips previous subscriber-class DSP cores in 
performance, providing the horsepower needed by new-generation wireless and wireline 
standards such as DSL networks and 2.5/3G digital cellular phones. These high-speed and 
wideband technologies have data rates 10-100 times greater than older standards and thus 
require much faster digital signal processing. With SC140-based DSPs, new-generation 
subscriber devices will be able to take full advantage of these high data rates to offer 
capabilities such as wireless web browsing and real-time video communications. 
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The SC 140 is equally suited to a variety of non-communication applications. It can 
perform bitstream processing and audio decoding in home theater systems, and image 
processing and compression in digital cameras. With its high throughput it could be used 
to integrate multiple communication systems (cellular phone, radio, global position system) 
in automobiles or provide the signal processing used by intelligent vehicle applications, 
such as intelligent vehicle systems. These are just a few of many examples. 
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DEVELOPMENT TOOLS 


The Star*Core effort has focused from the outset on ensuring a wide selection of best- 
in-class development tools for Star*Core-based designs. The result is a level of support 
unusual for a new architecture. Customers can choose, for example, from multiple 
compilers, development environments and real-time operating system software. 

The Star*Core Alliance is providing baseline tools such as an assembler, optimizer, 
linker, simulator and C/C-i-i- compiler. The Star*Core compiler conforms to ANSI C and 
C++ standards and generates code that is exceptionally compact (approaching the code 
density of the best microprocessors) and high performance (comparable to assembly code 
running on other DSPs). The compiler optimizes code for maximal parallelism, taking full 
advantage of the core’s multiple execution units. The compiler also provides intrinsic 
support for ITU/ETSI primitives—useful for vocoder standard reference code—and 
supports source-level debugging when used with integrated development environments. 
These baseline tools are available now in pre-production versions. 

The baseline tools will be featured in visual integrated development environments 
(IDEs) to be provided by Lucent and Motorola in support of specific SC140-based chip 
products. In addition to the baseline software tools, the IDEs will include real-time source- 
level debugging and profiling tools. IDEs will be available upon introduction of the first 
SC140-derived chips, expected in the year 2000. 

An alternative C/C-i-i- compiler will be provided by Green Hills Software, as part of its 
MULTFM development environment. In addition, third-party suppliers Embedded 
Systems Products and Enea OSE Systems will be offering real-time operating systems (the 
rTxctm Operating System and the OSE Operating System, respectively) to support the 
SC 140 core. 
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THE FUTURE OF THE SC140 AND STAR*CORE 
ARCHITECTURES 


While SC 140 core is the first of the Star*Core Alliance DSP offerings, it is hardly the 
last. In the future, the Star*Core Alliance will develop new versions of the SC 140, 
providing faster clock speeds and lower power consumption. In addition to the SC 140 
core, the Alliance will continue to develop new, compatible cores based on the SC 100 
architecture. By varying such design features as the number of on-chip MAC units, the 
Alliance will create SC 100-class cores to serve additional markets and applications, such as 
entry-level DSP and advanced embedded control. The two Star*Core partners, meanwhile, 
will separately develop their own system-on-a-chip solutions based on the SC 140 and 
future SC 100-generation cores. The first such SC140-based chips are expected from 
Motorola and Lucent in the year 2000. In the longer term, the Star*Core Alliance will 
continue to develop new-generation architectures even beyond the SC 100, serving the 
needs of advanced DSP applications well into the twenty-first century. 
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